智能论文笔记

Discovering Language Model Behaviors with Model-Written Evaluations

Ethan Perez , Sam Ringer , Kamilė Lukošiūtė , Karina Nguyen , Edwin Chen , Scott Heiner , Craig Pettit , Catherine Olsson , Sandipan Kundu , Saurav Kadavath

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.

translated by 谷歌翻译

In-context Learning and Induction Heads

Catherine Olsson , Nelson Elhage , Neel Nanda , Nicholas Joseph , Nova DasSarma , Tom Henighan , Ben Mann , Amanda Askell , Yuntao Bai , Anna Chen

分类：机器学习

2022-09-24

“感应头”是注意力头，它实现了一种简单的算法来完成令牌序列，例如[a] [b] ... [a] - > [b]。在这项工作中，我们提供了一个假设的初步和间接证据，即诱导头可能构成大型大型变压器模型中所有“文本学习”中大多数的机制（即减少在增加代币指数时损失的损失）。我们发现，诱导头在与秘密学习能力突然急剧上的急剧上升的位置完全相同，这是训练损失的颠簸。我们提出了六种互补的证据，认为诱导头可能是任何大小的变压器模型中一般性内部学习的机理来源。对于仅关注的小型模型，我们提供了有力的因果证据。对于具有MLP的较大模型，我们提供相关证据。

translated by 谷歌翻译

Language Models (Mostly) Know What They Know

Saurav Kadavath , Tom Conerly , Amanda Askell , Tom Henighan , Dawn Drain , Ethan Perez , Nicholas Schiefer , Zac Hatfield Dodds , Nova DasSarma , Eli Tran-Johnson

分类：自然语言处理 | 人工智能 | 机器学习

2022-07-11

我们研究语言模型是否可以评估自己主张的有效性，并预测他们能够正确回答的问题。我们首先表明，当以正确的格式提供时，较大的模型在多样化的多项选择和True/False问题上进行了很好的校准。因此，我们可以通过要求模型首先提出答案，然后评估其答案正确的概率“ p（true）”来对开放式采样任务进行自我评估。我们发现在各种任务中，P（true）的表现，校准和缩放令人鼓舞。当我们允许模型考虑自己的许多样本之前，在预测一种特定可能性的有效性之前，自我评估的性能进一步改善。接下来，我们研究是否可以培训模型来预测“ P（ik）”，即“我知道”问题的概率，而无需参考任何特定提出的答案。模型在预测P（IK）方面表现良好，并且在跨任务中部分概括，尽管它们在新任务上的P（IK）校准方面遇到了困难。预测的p（IK）概率在存在相关的原始材料的情况下以及对数学单词问题解决方案的提示也适当增加。我们希望这些观察结果为培训更诚实的模型提供了基础，并研究了诚实对模型模仿人类写作以外的其他目标培训的案例的普遍性。

translated by 谷歌翻译

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud , Arthur Mensch , Jordan Hoffmann , Trevor Cai , Eliza Rutherford , Katie Millican , George van den Driessche , Jean-Baptiste Lespiau , Bogdan Damoc , Aidan Clark

分类：自然语言处理 | 机器学习

2021-12-08

我们通过与与前面令牌的局部相似度，通过调节从大语料库检索的文档块来增强自动回归语言模型。尽管使用25美元\时分，我们的检索增强型变压器（RetroCro）的检索增强型变压器（RetroCr）对GPT-3和侏罗纪-1获得了可比性的性能。微调后，复古表演转换为下游知识密集型任务，如问题应答。复古结合了冷冻BERT猎犬，一种可微分的编码器和块状的横向机制，以预测基于数量级的令牌，而不是训练期间通常消耗的数量。我们通常从头开始训练复古，还可以快速改造预先接受的变压器，通过检索，仍然达到良好的性能。我们的工作通过以前所未有的规模开辟了通过显式内存改进语言模型的新途径。

translated by 谷歌翻译

A General Language Assistant as a Laboratory for Alignment

Amanda Askell , Yuntao Bai , Anna Chen , Dawn Drain , Deep Ganguli , Tom Henighan , Andy Jones , Nicholas Joseph , Ben Mann , Nova DasSarma

分类：自然语言处理 | 机器学习

2021-12-01

鉴于大型语言模型的广泛能力，应该有可能朝着一般的文本的助手工作，这些助手与人类价值一致，这意味着它是有帮助，诚实的和无害的。在此方向上的初始遗传，我们研究简单的基线技术和评估，例如提示。我们发现，从模型规模增加适度的干预措施的好处，概括为各种对准评估，并不会损害大型模型的性能。接下来，我们调查与对齐，比较仿制，二进制歧视和排名偏好建模相关的几个培训目标的缩放趋势。我们发现排名优先级模型比模仿学习更好地表现得多，并且通常以模型大小更有利地缩放。相比之下，二进制歧视通常与模仿学习非常类似地执行和缩放。最后，我们研究了一种“偏好模型预训练阶段的培训阶段，其目的是在对人偏好的芬明时提高样本效率。

translated by 谷歌翻译

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark

分类：

2021-02-26

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

translated by 谷歌翻译

Through-life Monitoring of Resource-constrained Systems and Fleets

Felipe Montana , Adam Hartwell , Will Jacobs , Visakan Kadirkamanathan , Andrew R Mills , Tom Clark

分类：机器学习

2023-01-03

A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.

translated by 谷歌翻译

Deep Learning for Space Weather Prediction: Bridging the Gap between Heliophysics Data and Theory

John C. Dorelli , Chris Bard , Thomas Y. Chen , Daniel Da Silva , Luiz Fernando Guides dos Santos , Jack Ireland , Michael Kirk , Ryan McGranaghan , Ayris Narock , Teresa Nieves-Chinchilla

分类：机器学习

2022-12-27

Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.

translated by 谷歌翻译

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Ibtihel Amara , Nazanin Sepahvand , Brett H. Meyer , Warren J. Gross , James J. Clark

分类：计算机视觉

2022-12-25

Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.

translated by 谷歌翻译

Eigenvalue initialisation and regularisation for Koopman autoencoders

Jack W. Miller , Charles O'Neill , Navid C. Constantinou , Omri Azencot

分类：机器学习

2022-12-23

Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physics-related settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches.

translated by 谷歌翻译